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Abstract — Distributed computation in artificial life and com- 
plex systems is often described in terms of component operations 
on information: information storage, transfer and modification. 
Information modification remains poorly described however, 
with the popularly-understood examples of glider and particle 
collisions in cellular automata being only quantitatively identified 
to date using a heuristic (separable information) rather than a 
proper information-theoretic measure. We outline how a recently- 
introduced axiomatic framework for measuring information re- 
dundancy and synergy, called partial information decomposition, 
can be applied to a perspective of distributed computation in 
order to quantify component operations on information. Using 
this framework, we propose a new measure of information 
modification that captures the intuitive understanding of informa- 
tion modification events as those involving interactions between 
two or more information sources. We also consider how the 
local dynamics of information modification in space and time 
could be measured, and suggest a new axiom that redundancy 
measures would need to meet in order to make such local 
measurements. Finally, we evaluate the potential for existing 
redundancy measures to meet this localizability axiom. 

I. Introduction 

Considering how variables are dynamically composed of in- 
formation from various sources is a topical subject in physics, 
complex systems and artificial life. For example, we have seen 
the dynamics of information studied in cellular automata ([Ti- 
ll?), brain-body-environment systems (5), financial systems [6|, 
models of gene regulatory networks [|7 |, and the relation of 
network structure to these dynamics [8|. 

There are several perspectives on how the composition or 
"credit assignment" of information could be studied (e.g. (9)- 
(12)). We study information dynamics through the lens of 
distributed computation, focussing on operations of informa- 
tion storage, transfer and modification [2]— [4|, [12] (described 
in Section IIIi. This is because these terms are generally 



well-understood (e.g. information transfer as directed coupling 
between two nodes) especially in comparison to general no- 
tions of complexity, and can be measured on any type of 
time-series data. Furthermore, computation is the language in 
which dynamics in complex systems are often described (e.g. 
Langton's "Computation at the edge of chaos" fl3)). 

Crucially, this approach has provided key theoretic insights 
into cellular automata (CAs), a critical proving ground for any 
theory on the fundamental nature of distributed computation 



in complex systems. CAs are discrete dynamical systems with 
an array of cells that synchronously update their state as a 
function of a fixed number of spatial neighbors cells using a 
uniform rule (14). Elementary CAs (ECAs) are ID arrays of 
binary state cells with one neighbor on either side. Studies of 
computation in CAs typically focus on emergent structures, 
such as domains, particles, and gliders. A domain is a set 
of background configurations, any of which will update to 
another such configuration in the absence of disturbances. Par- 
ticles are dynamic, coherent spatiotemporal structures against 
this background: gliders are regular particles, and blinkers are 
stationary gliders. The information dynamics approach pro- 
vided the first quantitative evidence [2|-|4| for the conjecture 
(13) that blinkers are information storage entities, that particles 
are associated with information transfer, and that particle 
collisions correspond to information modification events. 

Despite the success of this perspective, we do not have 
a complete quantitative understanding of the notion of in- 
formation modification. It is often colloquially described as 
the processing of information into a new form. It has been 
viewed as a pivotal operation for biological neural networks 
and models thereof (l5)-(T7), where it has been suggested as 
a potential biological driver (16) . It is also a key operation in 
collision-based computing ]18[. As such, information modifi- 
cation operations are likely to be required to support complex 
behavior in artificial life and biological systems. 

To be specific, information modification has been inter- 
preted to mean interactions between transmitted and/or stored 
information which result in a modification of one or the other 
(T3). This interpretation specifically juxtaposes modification 
against storage and transfer, viewing it as a dynamic com- 
bination or synthesis of information from different sources. 
Modification therefore involves a non-trivial processing of 
information from two or more (storage or transfer) sources, 
rather than a trivial retrieval, movement or translation of one 
source of information alone. The separable information was 
introduced previously to study information modification (3). 
Whilst it quantitatively identified particle collisions in cellular 
automata as modification events, the separable information is 
a heuristic rather than a proper information-theoretic measure. 

Much recent attention (TTJ, (T9|]-[24| has been focused 
on information-theoretic measures of redundancy and synergy 



between information sources in creating outcomes in a target 
or destination variable. These efforts began with the abstract, 
axiomatic partial information decomposition (PID) framework 
of Williams and Beer [11 1, as described in Section [IV] The 
concept of synergy, as formalized in the PID framework, is 
particularly appealing for the notion of information modifica- 
tion described above, as it explicitly quantifies the information 
associated with two or more information sources that is not 
present in any subset of those sources. In Section [V] we 
propose a measure of information modification based on the 
PID framework and its concept of synergy, and discuss its 
merits relative to previously proposed measures of information 
modification. In particular, we argue that (1) our measure 
clarifies the intertwined nature of information modification and 
transfer — with modification corresponding to the synergistic 
parts of transfer — and (2) our measure has the desirable 
property that modification events of various orders can be 
hierarchically decomposed into separately quantifiable terms. 

Furthermore, we describe in Section [VI] how, in order to 
study the dynamics of such modification on a local scale 
in space and time, we require the concrete measures of 
redundancy and synergy applied via the PID framework to 
be localizable themselves. We define a new axiom for such 
concrete measures to satisfy in terms of localizability, but show 
that / m j n [11 1 (the most prominent redundancy measure) does 
not satisfy it. Finally, we consider the future prospects for a 
concrete measure that could be applied to properly quantify 
information modification on a local scale in space and time. 

II. Information theory 

In this section, we briefly introduce two key background 
concepts from information theory |25|-|27| related to our 
analysis: the nature of redundant and synergistic contributions 
of two variables to the information in another, and the local 
value of information measures at specific observations. 

The mutual information (MI) between X and Y measures 
the average reduction in uncertainty about x that results 
from learning the value of y, or vice versa: I(X; Y) = 
H(X) - H(X\Y), where H(X) = - log 2 
and H(X\Y) = — ^Z xy p{x,y)\og 2 p{x\y) are the Shannon 
entropy and conditional entropy respectively. The conditional 
mutual information between X and Y given Z is the MI 
between X and Y when Z is known: I(X;Y\Z) = H(X\Z)- 
H(X\Y, Z). One can consider the MI from two variables 
Yi,Y% jointly to another, I(X\Y\,Y2), and decompose this 
into the information carried by the first variable plus that 
carried by the second conditioned on the first: I(X; Y%, Y 2 ) = 
I(X;Y 1 ) + I(X:Y 2 \Y 1 ). It is crucial to understand that a 
conditional MI I(X;Y\Z) may be either larger or smaller 
than the related unconditioned MI I(X; Y) |27) ; the condi- 
tioning removes information redundantly held by the source 
Y and the conditioned variable Z about X, but also includes 
synergistic information about X which can only be decoded 
with knowledge of both the source Y and conditioned variable 
Z. These components cannot be teased apart with traditional 



information-theoretic analysis; the partial information decom- 
position (Section IV i was introduced for this purpose fTT) . 

Next, note that the aforementioned information-theoretic 
quantities are averages over all of the observations used to 
compute the relevant probability distribution functions (PDFs). 
One can also write down local or pointwise measures for each 
of these quantities, representing their value for one specific 
observation or configuration of the variables (x, y, z) being ob- 
served. The average of a local quantity over all observations is 
of course the relevant average information-theoretic measure. 
Applied to time-series data, local measures tell us about the 
dynamics of information in the system, since they vary with 
the specific observations in time, and local values are known 
to reveal more details about the system than the averages 
alone (T), (28). For example, the local mutual information 
|29j I(X = x-Y = y) = i(x;y) = \og 2 p(x \ y)/p{x) 



for a specific observation [x, y) is the information held in 
common between the specific values x and y. (By convention, 
we use lower case symbols for the local quantities.) Indeed, 
the form of i(x; y) is derived directly from four postulates |29 
ch. 2]: once-differentiability, similar form for conditional MI, 
additivity (i.e. i({y n ,z n 

and separation for independent ensembles. This derivation also 
means that i(x;y) is uniquely specified, up to the base of the 
logarithm. Of course, I(X;Y) = (i(x;y)}, and like I(X;Y), 
i(x; y) is symmetric in x and y (see further discussion in J30)). 
Importantly, i(x;y) may be positive or negative, meaning 
that one variable can either positively inform us or actually 
misinform us about the other. An observer is misinformed 
where, conditioned on the value of y the observed outcome 
of x was relatively unlikely as compared to the unconditioned 
probability of that outcome (i.e. p(x\y) < p{x)). 

III. Information dynamics 

A local framework for information dynamics has recently 
been introduced in (2|-||4), (12), (31) . This framework ex- 
amines how the next value x n+ i of a destination variable is 
computed in terms of how much of that information came 
from the past state of that variable (information storage), 
how much came from respective source variables (information 
transfer), and how those information sources were combined 
(information modification). The framework has a particular 
focus on the dynamics of these operations in time and space, 
and so provides spatiotemporal information profiles for each 
measure. In this section, we describe how the framework 
measures information storage and transfer, before considering 
information modification in Section IV1 

A. Information storage 

Information storage is the amount of information from the 
past of a process that is relevant to or will be used at some 
point in its future. In terms of the dynamics of information 
processing, we focus on how much of the stored information 
is actually in use in computing the current value of the 
process. As such, the active information storage (AIS) Ax 
was introduced [4| to explicitly measure how much of the 
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Fig. 1. Local profile of AIS a(i, n,k = 16) in bits for each cell i for each 
time step n in|(b)|for the raw states of CA rule 54 in|(a)| 



information from the past of a process X is observed to be 
in use in computing its next value. Ax is the average MI 
between realizations x' k )„ = {x n -k+i, ■ ■ ■ ,x n ~i,x n } of the 
past state X^ k ) and the corresponding realizations x n+ \ of the 
next value X' of a given time series X: 



A x (k)=I(X^;X') 



(1) 



We require lim^oo in general, unless x n +i is conditionally 



independent of the far past values x 



(oo) 



-k given x 



(k) 



[4|. 



We can then extract the local active information storage 
ax(n+l) [4j as the amount of information storage attributed 



to the specific configuration or realization (x 



(k) 



) at 



time step n+1; i.e. the storage in use by the process at n + 1: 



A x {k) = (a x {n+l,k)) n , 
a x (n + 1, k) = i(x (k) „;x„+i) = log 



(2) 



P(* (k) n, 



X n +l) 



(3) 



As a local MI, ax(n + l,k) may be positive or negative, 
meaning the past history of the variable can either positively 
inform us or actually misinform us about its next state. 

As reported in |4| (with sample results in Fig. [TJ, when 
applied to CAs the local AIS takes on large positive values in 
the domain and blinkers, since for these entities the next state 
is predictable from the destination's past. This was the first 
direct quantitative evidence that blinkers and domains were the 
dominant information storage entities in CAs. Furthermore, 
negative values are measured when gliders are encountered, 
because the past of the destination (being in the domain) would 
misinformatively predict domain continuation. 

B. Information transfer 

Information transfer is defined as the amount of information 
that a source provides about a destination's next state in 
the context of the destination's past. This definition pertains 
to Schreiber's transfer entropy (TE) measure [32) . The TE 
captures the average MI from realizations y n of a sourc^j] Y 

'TE can consider realizations of the source state J/4 . This is appropriate 
where the observations y mask a hidden causal process to X, or where 
multiple past values of Y in addition to y„ are causal to x„_|_i 1301. 



to the corresponding realizations x n+1 of the destination X', 
conditioned on realizations x^ k ' n of the destination's previous 
state X( k ); 



T Y ^ x (k) = I(Y;X' |X< k )). 



(4) 



Different values of k produce different results here, but in 
alignment with Ax(k), in general one should take the limit 
k — > oo here (except for similar conditional independence 
cases), in order to properly interpret the transfer entropy as a 
measure of information transfer (2), [ |30| . 

We can then extract the local transfer entropy t Y ^x(n + 
1) |2J as the transfer attributed to the specific realization 
{x n+ i, x( k '„, y n ) at time step n+1; i.e. the amount of 
information transfered from Y to X at n + 1: 



T Y ^x(k) = (t Y ^x{n + l,k)), 

p(x n+1 | x( k ) n ,y n ) 



ty^x{n + l,k) = log 



L n+1 

2 p(x n+1 \*W 

i(y n ;x n +i I x (k) n ). 



(5) 

(6) 
(7) 



For proper interpretation as information transfer, Y is con- 
strained among the g causal information contributors to X, 
say Y S {Y\, . . . , Y g } \ X (30). Importantly, the information 
conditioned on by the TE is that provided by the AIS. 

Like local MI, local TE may be either positive or negative. 
As reported in [2|, when applied to CAs it is typically strongly 
positive at gliders when measured in the same direction as the 
glider's motion (e.g. information transfer across one cell to the 
right per unit time). Note: this result only holds for large k, 
i.e. when storage and transfer are properly separated. These 
results provided the first quantitative evidence for the long- 
held conjecture that particles are the dominant information 
transfer entities in CAs. Negative values imply that the source 
misinforms an observer about the next state of the destination 
in the context of the destination's past, and are typically found 
when TE is measured orthogonally to a moving glider. 

TE can also be conditioned on other possible sources Z to 
account for their effects on the destination. The conditional 
transfer entropy was introduced for this purpose (2j, (3}: 

V^x\z(k) = I(Y;X> |X< k U), 



Y^X\Z 



(k) 

(k) = (t Y ^x\z(n + l,k)) 



tY^x\z(n + l,k) = log 2 



P(Xn+l I 



r (k) 



p(x n+1 | x( k ) n , z n ) 



n+1 



(8) 
(9) 

(10) 

(11) 



We specifically refer to the conditional TE as the complete 

transfer entropy (T Y ^. x (k) an( l t Y ^x( n + wnen it 

conditions on all other causal sources Z to the destination 
X Q. For clarity then, we refer to T Y ^x simply as the 
apparent transfer entropy (2). As conditional MI terms, these 
TEs may be larger or smaller than the unconditioned Mis; 
we consider how such redundancies and synergies can be 
specifically measured in the next section. 

Finally, note that one can decompose the MI from the 
sources to destination as a sum of incrementally conditioned 



MI terms (3), [30) ; e.g. for a two source system: 

I(X'; xW.Yi, Y 2 ) = I(X'; X< k >) + /(X'; F x | X^)+ 

+ I(X';Y 2 \X^,Y 1 ), (12) 



I(X;M,Y) 



= A 



x 



Yi-)-X 



(fc) + T, 



This equation could be reversed in the order of Y\ and Y 2 , and 
its correctness is independent of k (so long as k is large enough 
to capture the causal sources in the past of the destination). 

IV. Partial information decomposition 

A. Abstract definition 

The PID framework provides a general method of decom- 
posing the information I(X; A) that a set of source variables 
A = {Ai, . . . , A r } provide about a destination variable X 
[jTT) . The core idea underlying this method is a measure of re- 
dundancy Ir\{X; Ai, . . . , A r ), which captures the overlapping 
information that sources Ai, . . . , A r C A (which may be joint 
variables in general) share about the destination X. Intuitively, 
redundancy acts on information sources like the intersection 
operator acts on sets, capturing the information that is common 
to all sources. Indeed, the redundancy measure I n is defined 
by the following axioms, each of which is analogous to a basic 
property of set intersection: 

Axiom 1. Symmetry: I n is symmetric in the Ai's. 
Axiom 2. Self-redundancy: 7 n (X;A,) = J(X; A,). 



Axiom 3. Monotonicity: In(X; Ai, . . . , A r _i, A r 
In{X; Ai, . . . , A r _i) with equality if A r _i C A r . 



< 



Using I n and a form of inclusion-exclusion, the PID 
framework specifies how the total information I(X; A) de- 
composes into a sum of Pi-terms, given by the function 
Iq. In the simplest case of two source variables, the total 
information I{X;Ai,A 2 ) decomposes into: a. the redundant 
information about X which is shared by both A\ and A 2 : 
Ig(X; {A 1 }{A 2 }) = I n (X;Ai,A 2 ); b. the unique infor- 
mation from A\ (resp. A 2 ): Ig(X;{Ai}) = I(X;Ai) - 
I n (X; Ai, A 2 ): ; and c. the synergistic information which can 
only be identified when A\ and A 2 are considered jointly as 
{A 1 ,A 2 }: I d (X;{A 1 ,A 2 }) = I{X;A 1} A 2 ) - I{X;Ax) - 
I(X; A 2 )+I n (X; A\, A 2 ). The relationships between synergy, 
redundancy, and unique information can be represented using 
a Pi-diagram (see Fig. [2]), which shows the set-theoretic 
breakdown of I(X; Ax, A 2 ) into Pi-terms. Without a valid 
measure for redundancy, it would not be possible to separately 
measure these four Pi-terms using only the three independent 
standard information-theoretic terms I(X; Ai,A 2 ), I(X;Ai) 
and I(X;Ai). The Pi-diagram for three source variables is 
shown in Fig. [3] and from this the general structure of PI 
decomposition can be seen. In general, the Pi-term Ig(X;a) 
for a collection of sources a corresponds to the information 
provided redundantly by the synergies of all sources in the 
collection, corresponding to one distinct way for the source 
variables to contribute information about the destination. Put 




I(X;M) 



I(X;Y) 



Fig. 2. Partial information diagram of information I(X; M,Y) in X 
from two source variables M,Y (ignoring the colors). {M}{y} represents 
the redundant information in the two sources, {M} and {Y} represent the 
unique information from each source, and {M, Y} represents the synergistic 
information from the sources. If we consider M to be the past state X' k ) 
of the destination X, and Y as another causal source, then this Pi-diagram 
partitions the AIS (white) and TE (green). (This is called the Pi-diagram for 
three variables in |11|, including the destination variable.) 



another way, Ig(X;a) is "the information provided redun- 
dantly by the sources of a that is not provided by any simpler 
collection of sources" fTT) , where any simpler collection (3 is 
lower than a on the hierarchy (or redundancy lattice) of the 
set-theoretic breakdown of I(X; A): 



I B (X;a) = I n (X;a)-yi B (X;P). 



(13) 



The boundary case is for a with no simpler collection of 
sources, where Ig(X;a) is simply the redundancy I n (X;a). 

B. The J m i n measure for redundancy 

The abstract formulation of PI decomposition works for any 
redundancy measure that satisfies the axioms for I n . However, 
to actually compute Pi-terms, a concrete redundancy measure 
satisfying this axiomatic definition is needed. Williams and 
Beer proposed the following candidate measure JTT) : 

7 min pf ; Ax, ... , A r ) = J2p( s ) ™ in J ( X = x > A j)' ( 14 > 



I(X = x;A)=J2p(a\x) 



log 2 



1 



p(x) 



log 2 



1 



p(x\a) 



Pi-terms Ig(X; Ai, . . . , A r ) which are measured using 7 m j n 
for 7 n are labeled as II(X; Ai, . . . , A r ). 

7 min measures redundancy as the minimum amount of in- 
formation which can be found in any source Aj . This has been 
criticized since it does not specifically require each source to 
hold the same information, as demonstrated with the "two- 
bit copy problem" pT) , | |22| , f24| , which is the observation 
that I m i n ({Ai, A 2 }; Ai, A 2 ) = 1 bit for independent random 
bits A\,A 2 . This observation prompted the proposal of a new 
axiom for I n |22] : 

Axiom 4. Identity: I n ({A u A 2 };A 1 ,A 2 ) =I(Ax;A 2 ). 
Alternatives measures of redundancy which satisfy this 



additional axiom have been proposed by Harder et al. |22| 



i(X;M,y,,y 2 ) 
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I(X;M,y 2 ) 



Fig. 3. Pi-diagram of information in X decomposed from three source 
variables M, Y\ , Y2 (ignoring the colors). If we consider M to be the past 
state X' k ) of the destination X, and Y\ and Y2 as two other causal sources, 
then this Pi-diagram partitions the AIS (red) and transferred information (all 
other information; blue and purple here). The transferred information (from 
two sources) can be further partitioned into apparent TE from Y± (blue), 
then complete TE from Y2 (purple). (This is called the Pi-diagram for four 
variables in |1 1 1 . including the destination variable.) 



and Griffith and Koch [21]. We 



describe these briefly in 
n in our current study as 



Section VI-C though focus on I, 
the originally-presented concrete measure. 

C. Pi-decomposition of information dynamics 

PID can clearly be applied to the information sources for a 
destination as defined by information dynamics for distributed 
computation; i.e. the set Sjjc = {X' k ), Y±, . . . , Y g }, including 
the previous state of the destination, and the other causal 
sources. This is a partitioning of the information in the next 
state of the destination variable into information storage and 
complex transfer terms, and their sub-components. Fig. [3] 
shows the Pi-diagram for these components; the identification 
of AIS and apparent TE in this Pi-diagram was first given in 
pO) , and is akin to the decomposition given in Eq. (12 1. 

Considering the apparent TE Ty^x (k) as a conditional MI, 
Williams and Beer [19| note that it is composed of a unique 
component Ig(X'; {Y}) from the source Y (state-independent 
TE) plus a synergistic component Ig(X'; {Y, X( k )}) from the 
source Y interacting with the past state X( k ) (state-dependent 
TE) (see Fig. [2|. The case for the conditional/complete TE is 
more complicated again (see Fig. 13}, where there are many 
more varieties of synergistic components involved. Similarly, 
Flecker et al. [20 1 suggested that breaking down the Pi-terms 
of the storage and transfer measures can reveal further insights 
into the local dynamics of a system. (We will revisit the 



approach to localizing these components in Section V-Ai. 

Finally, note the role of the past state of the destination X( k ' 
as a joint source here. Using different values of k changes 
the values of the Pi-terms, redistributing the decomposition of 
the information amongst them. (The information attributed to 
storage in I(X';X^) is non-decreasing with larger fc, which 
may decrease information in other Pi-terms). For our purposes 
k — » 00 should be used, to align with proper measurement of 



information storage and transfer (as described in Section III 1. 
The use of large k for X^ k ' is not about gathering all causal 
sources in the past of the destination (indeed, it's unlikely 
that most of these values will be directly causal to X'). It 
is about providing context for our analysis, or providing the 
perspective of distributed computation (2), |4| by properly 
identifying information storage and transfer in the Pi-diagram. 

V. Modified and non-modified information 

Given our view in Section U of information modification as 
the synthesis of information from more than one information 
storage or transfer source alone, the PID has an obvious 
application here. In this section, we first briefly review recent 
initial approaches to measuring information modification, be- 
fore proposing how to properly capture it in the Pi-diagram. 

A. Initial approaches 

The separable information was introduced by Lizier et al. 
|3j to capture the information gathered by an observer about 
the next state of X from separate inspection of the storage 
and transfer sources. Locally, it is defined simply as: 



s x (n, k) — a x (n, k) + 



Ye{Y lt ...,Y g }\X 



■X 



(n,k). (15) 



The intuition behind the separable information was that local 
AIS and TE become negative where unconsidered sources 
act strongly to create an outcome in the destination. It was 
hypothesized that if sx(n,k) < 0, then no source provides 
strong positive information about the outcome when inspected 
individually and a non-trivial information modification must be 
taking place. Indeed, sx(n, k) was the first method to directly 
identify particle collisions in CAs as information modification 
events J3J. However, it was acknowledged in [3] that sx(n, k) 
ignored interaction or redundancies between the sources, and 
indeed with the mechanics of PID available, Flecker et al. |20| 
identified which components in the Pi-diagram of Fig. [3] were 
double-counted and ignored by sx{n,k). As such, sx(n,k) 
remains a heuristic rather than a measure, though it guides 
us in the right direction. It seems that sx(n,k) < was a 
good predictor of modification events because sx(n,k) < 
events are likely to have strong synergistic components in the 
Pi-diagram, and these synergistic components are more likely 
to measure the information modification. 



Building on these insights, Flecker et al. [20] suggested that 
a more natural way of "quantifying the extent to which the 
whole contributes information beyond the sum of the parts" for 
ECAs would be the 3-way synergy U(X'; {X< k \ Yf\ F 2 (fe) }) 
between X( k ) and the two neighboring causal sources Y\ and 



Y 2 (akin to the outer-most Pi-term in Fig. [3] but with full 
states of Y\ and Y 2 instead of single values). This generalizes 
as the highest-order synergy term in the Pi-diagram between 
the storage and transfer sources. While this is certainly a 
proper information-theoretic measure, it did not work as well 
in identifying particle collisions in complex CA rules (20). A 
possible factor was the perspective in [20| that transfer and 
modification were mutually exclusive concepts. This would 
(as discussed later) ignore the state-dependent TE fl9) , a con- 
stituent of information transfer which captures the interaction 
between the source and the past state of the destination. This 
may have led the identified measure to miss some possible 
contributions to the modification (i.e. lower-order synergy 
terms). Furthermore, the localisation of the PI terms in p0[ 
was a sliding window, which as discussed in Section [VI] does 
not properly attribute a local value to a specific configuration. 

B. Requirements for a measure of information modification 

Having evaluated these attempts to measure information 
modification in distributed computation, we propose the fol- 
lowing requirements that a measure of information modifica- 
tion Mx should satisfy. Mx should: 

1) be a proper information-theoretic quantity; 

2) examine the interaction between the information storage 
X^ k ) and causal transfer sources Y € {Yi, . . . , Y g }; 

3) allow local measurement nix at specific observed 
configurations (x n+1 , x ( n\ yi, n , y g ^ (defined in 
more detail in Section [VlJ; 

4) be extendible to an arbitrary number of sources g. 

Clearly, the separable information fails to satisfy require- 
ment [T] while the 3-way synergy as localized via sliding 
windows in [20| does not satisfy requirement [3] 

Also, we expect that requirement [2] which gives the per- 
spective of distributed computation in using the past state of 
the destination X^ k \ will be important (i.e. using k = 1 say 
would not suffice). This is because we know that measures 
of information storage and transfer do not properly align with 
our understanding of these concepts without large k |5J, (4), 
and similarly large k was required for the precursor heuristic 
separable information to identify collision points in CAs. 

C. Partitioning modified and non-modified information 

We return to our accepted definition of information mod- 
ification as interactions between transmitted and/or stored 
information which result in a modification of one or the other. 
We expect to split the total information /(X'; Srjc) about 
the destination X' from the information sources Sdc = 
{Xw, Yi,..., Y g } into modified information Mx and non- 
modified information I(X'; Sdc) — Mx- 

As hinted at previously, we identify the non-modified in- 
formation in the destination X' as any information that is 
identifiable in any one of the information sources in Sdc 
examined individually. In terms of PID, this is the sum of all 
Pi-terms which consider collections of joint sources where (at 
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Fig. 4. Pi-diagram of information in a destination X from three source 
variables Yi, Y%, and M = X' k ', identifying: a. non-modified information 
(no color), which could be decoded by examining individual sources only, 
and b. modified information (light-blue and green regions), composed of the 
information about the destination that could only be decoded by looking at 



all 3 sources / 



(o=3) 



(light-blue), and the information that could be decoded 



by examining only 2 sources together (but not singles) Ig 2 ' (green). 



least) one set of joint sources is only a single source: 

I(X':S UC )-M X = W-'P)- ( 16 > 

/3r<s DC 

3 7£/3, M=l 



Conversely then, we can define Mx directly from Eq. ( 16 1. 



Equivalently, we can say that the modified information Mx 
is the sum of all synergy terms in the Pi-diagram for 
J(X';Sdc); i- e - a ll atoms in the Pi-diagram which consider 
collections of joint sources, where no set of joint sources in 
the collection only considers a single source: 



M x = Yu W'P)- 



(17) 



v 7 e/3, M>i 



Mx includes any information that cannot be found in one of 
the sources examined individually, i.e. that which is produced 
from a non-trivial combination of information from two or 
more sources in Sdc- Both modified and non-modified infor- 
mation can be easily identified on the Pi-diagram - see Fig. |4] 
This approach is along the lines suggested in |20| , but 
includes more Pi-terms. The key difference is that we include 
any Pi-terms whose collections of joint variables contain at 
minimum two variables; as such, this measure includes all 
synergistic information termsj^jln comparison to po) , we do 

2 Also, including spurious uncorrected sources in addition to {Yi, . . . , Y a } 
will remove all information in the highest-order synergy term used in [20], 
yet Mx remains the same since it still counts all synergistic Pi-terms. 



not consider the concepts of information transfer and modifica- 
tion to be mutually exclusive. As shown by the decomposition 
in Fig. [3] all of the information in the destination X is either 
stored information from its past, or (some type of) transferred 
information from the other sources. Our view is that modified 
information is simply the synergistic parts of such information 
transfer. To clarify this point with a more simple example, 
consider the two "source" Pi-diagram in Fig. [2] Here, our 
approach would label the synergy term Ig(X'; {X^ k \y}) as 
the information modification Mx, and note that in this case the 
quantity is precisely equal to the state-dependent TE, which 
is a constituent of information transfer | fT9| . 

D. Hierarchy of orders of interaction 

We can also define a hierarchy of the decomposition, in 
terms of the minimum number of interacting joint sources 
that information about the destination could be found in. 
For a generic Pi-diagram with sources {Ai,...,A r }, the 
information which could be decoded from only o sources but 
not o—l sources is: 

4 o) (X';{A l7 ...,A r })= W;/?), (18) 

/3^{Ai,...,A r } 
min(|7|)=o, ~f£f3 
r 

I(X'; {A X) ... , A r }) = Y, 4 0) (*'; {Ai, • ■ ■ , A r }) 

0=1 

We note that this addresses the goal of J5), pO) , to achieve 
a partitioning of information in a given variable or collective 
into a hierarchy of contributions from individual sources, from 
pairs of sources that was not contained in individuals, etc. In 
comparison to these approaches however, 7^°' avoids problem- 
atic double-counting and the use of the negative "interaction 
information" (TT) (unlike [9|), and (depending on the concrete 
implementation of Ig) is model-free (unlike JTOj). 

Using the distributed computation sources S DC , we have: 



Mx =I(X / ;S DC )-I { a °- 1 \x';S uc ), 

3+1 

M * = E4 0) (*';S DC ), 

0=2 



(19) 
(20) 



and clearly for the three source case {X^- 1 , Y\, Y2} in Fig. [4] 



we have Mx = Ig° 2 "* 



r(°=3) 



E. Modified information in ECAs 

We apply our definition of modified information to several 
important ECA rules, using the / m ; n candidate redundancy 
measure and II to compute M^ (as implemented in the 
publicly available software [33 1). Our results in Table [I] show 
that for simple, ordered CA rules, non-modified information 
dominates the decomposition of the next state of a cell. 
Conversely, for chaotic CAs (rules 18, 22 and 30), modified 
information dominates, resulting from synergistic interactions 
between sources. The complex CAs (rules 54 and 110) 
however seem to have a mix of modified and non-modified 
information. These results make intuitive sense, and align 
with previous observations in both CAs and random Boolean 



TABLE I 

Measurements (in bits) to 3 d.p. of the hierarchies of modified 

and non-modified information in ecas, using the i min 
redundancy measure. we use observations of 100 repeat runs 
of length 200 cas run for 200 time steps, with history length 
k = 16 here except for k = 1 in the last column. 



Rule 


Ui"=V 


n(°= 2 ) 


nc°= 3 J 


M$(k = 16) 


Af?(fc=l) 


18 


0.273 


0.464 


0.087 


0.551 


0.691 


22 


0.188 


0.188 


0.559 


0.747 


0.916 


30 


0.189 


0.558 


0.253 


0.811 


0.812 


54 


0.705 


0.087 


0.205 


0.292 


0.860 


110 


0.689 


0.177 


0.121 


0.298 


0.899 



networks that chaotic dynamics tend to be dominated by 
higher-order information transfer terms Q, ||3), (7). 

The same analysis run with only k = 1 past value for 
X( k ) does not provide the same insight, in fact identifying 
large amounts of information from triplet interactions for all 
the rules. This is because using k = 1 does not adequately 
partition information storage and transfer, and so does not 
achieve a proper perspective of distributed computation (as 



expected from Section V-B I. 

We would like to evaluate the dynamics of information 
modification in space and time - in the same manner as shown 
for AIS in Fig.[T|- since this will reveal whether they relate to 
particle collisions in CAs. To do so, we require the ability to 
compute the value of Pi-terms on a local rather than average 
scale, and we consider this in the next section. 

VI. Localising PI-terms 

The ability to localize PI-terms depends on the ability to 
localize the measure of redundancy 7 n to obtain relevant 
local values i n . Local PI-terms ig would be the sums of the 
relevant i n , as per the standard values. However, a property of 
localizability of the abstract measure 7 n does not follow from 
its definition by the original minimal set of axioms in [11 1, 
and so at this stage the localizability will be a property of 
the concrete measure (e.g. I m i n ) one selects to implement I n . 
Here we consider how one may define localizability of I n in 
terms of a further axiom, and subsequently consider whether 
the candidate concrete measures satisfy these axioms. 

A. Localizing redundancy 7 n 

For a candidate redundancy measure to be localizable (as 
defined for traditional measures in Section [D}, it must satisfy 
the following additional axiom for 7 n (X; Ai, . . . , A r ) : 

Axiom 5. (localizability) There exists a local measure 
in{x; ai, . . . , a r ) for the redundancy of a specific observation 
{x, ai, . . . , a r } of {X, Ai, . . . , A r }, such that: 

1) in{x'i ai, . . . , a r ) satisfies the corresponding symmetry 
and self- redundancy axioms as per Ir\{X] Ai, . . . , A r ); 

2) 7 n (X;Ai,...,A r .) = (i n (x; ai, . . . , a,.)); 

3) in(x] ai, . . . , a r ) is once-differentiable with respect to 
changes in p(x, ax, ... , a r ); and 

4) if\{x; ai, . . . , a r ) is uniquely defined for the given can- 
didate redundancy measure. 



Note that the self-redundancy axiom here means that 
in{x;&) = i(x;a); i.e. local self-redundancy is simply a local 
MI. As such, the relevant local MI terms should be sums of 
the relevant local Pi-terms ig. We recall that local MI terms 
are unique, symmetric, and additive, whilst averaging to give 
the relevant MI, and are once-differentiable with respect to 



small changes in the PDFs |29|, and the above axiom requires 



several similar features. Now, there is no requirement for the 
local values i n to satisfy monotonicity (unlike the average), 
in a similar way to local MI values being able to increase or 
decrease with the number of variables so long as the average 
MI increases. Similarly, since local MI values can be negative, 
then local redundancy and Pi-terms may also be negative. 

Sliding window methods are not local values, since they 
do not provide a value for a specific configuration (but are a 
function of the window as a whole). As such, the approach 
used in J20) is not an appropriate localization. 

With regard to continuity of i n (a;;ai, . . . ,a r ), we note 
from an information geometry perspective, the local value is 
effectively a function of d variables, where d is the number 
of degrees of freedom in defining p(x, ai, . . . , a r ) in the 
space of such probability distributions. The continuity of 
in(x;&i, . . . ,a r ) can be thought of as being with respect to 
these variables defining p(x, ai, . . . , a r ). Notably, Shannon 
required such continuity in defining the entropy p5| . 

Uniqueness of i n {x; a l5 . . . , a r ) will depend on the specific 
definition of the concrete redundancy measure. 

Finally, we argue that the motivation for a redundancy 
measure to satisfy localizability goes well beyond our desire 
to measure information modification on a local scale. This 
property would make the dynamics of any Pi-term measurable 
on a local scale in space and time, as for other measures. 

B. Localising I m \ n 

The straightforward way to localize / m ; n for a specific 
observation {x, ai, . . . , a r } of {X, Ai, . . . , A r } is to take: 

W(a:;ai, ...,a r ) = i(x;aj) = log 2 ^^p> ( 21 ) 
where a, is the specific value of Aj in this observation where: 



argmin I(X 
A, 



xi A, 



(22) 



Recalling that 7 lmn is the "minimum information that any 
source provides about each outcome" of the destination vari- 
able "averaged over all possible outcomes" fTT[ , here z m ; n 
is the information provided about the destination observation 
by the specific observation of the source Aj which provides 
the minimum information on average. This localization av- 



erages directly over p(x)p{a.j\x) (as per Eq. (14i) to give 
Imin(X; Ai, . . . , A r ), and at first seems to satisfy our axiom. 

However, it is simple to demonstrate that 
i m i n (x; ai, . . . , a r ) is not once-differentiable with respect 
to changes in the PDF p(x, ai, . . . , a r ). Let us take the 
Boolean OR function for binary variables, X = A\ + A%, 
and assume that we have an almost equiprobable distribution 



TABLE II 

Redundancy 7r(a;; {ai}, {a 2 }) = i m m(%; W}> {0-2}) =i( x \ a j) FOR 

THE OR FUNCTION X = Ax + A 2 , WITH AN EQUIPROBABLE INPUT 
DISTRIBUTION SLIGHTLY DISTURBED BY AN INFINITESIMAL S -> 0+. 



ax, a2 


X 


p(ax, 0,2) 


argmin I(X = x; Aj) 


i(x; aj) 


0,0 





0.25 


Ax 


1 


0,1 


1 


0.25 + <5 


Ax 


-0.585 


1,0 


1 


0.25 - <5 


Ax 


0.415 


1,1 


1 


0.25 


Ax 


0.415 



of the inputs (A\,A2) as shown in Table [IT] A small 
disturbance S — > + to the equiprobable distribution 
is enough to ensure that Aj = Ax is always selected 
by the min function her^] giving the local values for 
redundancy i m in( x \ { a i}> { a 2}) displayed in Table |TT] If the 
infinitesimal disturbance 5 changes sign however (causing 
a continuous change in the underlying PDF p(x, a\, 02)), 
this flips the selection of Aj to A%, and discontinuously 
swaps the local values of i m i n (x;{ax = 0},{a2 = 1}) and 
i m ; n (x;{a 1 = l},{a2 = 0}). Also, with 6 = there are 
two possible solutions for the local values, meaning the 
uniqueness requirement is not satisfied either. As such, this 
localization for 7 m i n does not satisfy the localizability axiom. 

It is tempting to define i m j n as the minimum information 
that any specific source observation provides about the destina- 
tion observation (i.e. taking the min of local values i(x; aj)), 
however this would not average over all observations to give 
^min- Aside from this, at this stage there are no other clear 
meaningful candidates for localization of I m i n . 

C. Prospects with other candidate redundancy measures 

There is the prospect that alternate measures satisfying the 
existing axioms for I n may satisfy the axioms we have laid out 
above for localizing redundancy and information modification. 
Two candidates here pT|, |22| were proposed to address the 



two-bit copy problem raised with 7 m i n . 

Griffith and Koch propose to measure the redundancy by 
mapping the destination X to a surrogate X' which preserves 
the information from each source Aj to the surrogate, but 
minimizes the overall mutual information from the sources to 
the surrogate [21]. This method at first seems localizable (by 
simply localizing the MI between the sources and the surro- 
gate), however as pointed out in pT) the mapping (i.e. PDFs) 
to produce the minimal MI is not unique. As such, the method 
does not immediately satisfy the uniqueness requirement for 
our localizability axiom, though potentially extra conditions 
could be added to the definition in future to meaningfully 
uniquely identify the minimizing mapping. 

Harder et al. p2) propose an information geometry based 
approach. This involves projecting the conditional distributions 
of the destination X given each source Aj onto eachother in 
the relevant information-geometric space. At first glance this 
method seems localizable. However, it is not currently suitable 

3 For X = 0, since A\ = (slightly) more often when X ^ 0, then A\ 
tells us less specific information about X. Similarly, Ax = 1 (slightly) less 
often when X = 1, so again tells us less specific information about X . 



for our purposes in investigating information modification, 
since it is currently only defined for a pair of sources. If it 
can be extended to an arbitrary number of sources, it should 
satisfy our requirement [4] in Section |V-B| for applicability to 
capture information modification via a Pi-diagram. 

VII. Conclusion 

We have described how frameworks for information dy- 
namics and partial information decomposition could be used 
together to describe the modification of information in dis- 
tributed computation. This involves examining the partial 
information diagram for the information storage and transfer 
sources to a destination, and then identifying synergies for pair 
interactions and above as information modification. 

We applied the / m ; n measure of redundancy to cellular 
automata in this fashion, and demonstrated that ordered CAs 
have little modified information, the dynamics of chaotic CAs 
are dominated by information modification, while complex 
CAs have an intermediate level. It remains to be seen whether 
the overall nature of these results would change if using an 
alternative redundancy measure to 7 min (e.g. pT) , J22)). 

Examining the dynamics of such information modification 
on a local scale in space and time requires localizability of 
the given redundancy measure that one uses to compute the 
Pi-terms. We have suggested an axiom that such a measure 
should satisfy for it to be localizable, and demonstrated that the 
^min measure does not satisfy this axiom. Finally, we assessed 
the potential for other candidate redundancy measures to be 
applied to local information modification. We found that none 
are suitable in their current form, but there is potential for 
them to be extended to meet our requirements. 
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